BERT (Bidirectional Encoder Representations from Transformers) has revolutionized natural language processing by leveraging deep bidirectional representations. One of the tasks BERT is trained on is MLM, or Masked Language Modeling. This means that if the model receives a sentence like: “He was a young wizard, his name was Harry [MASK].” It should predict that the […]